Listing of Claims: 



1 . (Currently amended) A method of identifying molecules for production, wherein the 
molecules are represented by concatenated strings, said method comprising: 

i) encoding two or more biological molecules into a data structure of initial character 
strings to provide a collection of two or more different initial character strings wherein each of 
said biological molecules comprises at least about 10 subunits; 

ii) selecting at least two substrings from said initial character strings; 

iii) concatenating said substrings to form one or more product strings about the same 
length as one or more of the initial character strings; 

iv) adding the product strings to a data structure to populate a data structure of product 

strings; 

v) determining sequence identities of at least one of the product strings relative to at least 
one initial character string; and 

vi) selecting one or more product biological molecules for production, wherein the one or 
more product biological molecules correspond to one or more of the product strings having 
greater than 30% sequence identity with the at least one initial character string. 

2. (Previously presented) The method of claim 1, wherein said encoding comprises 
encoding two or more nucleic acid sequences into said character strings. 

3. (Previously presented) The method of claim 2, wherein said two or more nucleic 
acid sequences comprise a nucleic acid sequence encoding a naturally occurring protein. 

4. (Previously presented) The method of claim 1, wherein said encoding comprises 
encoding two or more amino acid sequences into said character strings. 

5. (Previously presented) The method of claim 4, wherein said two or more amino 
acid sequences comprise an amino acid sequence encoding a naturally occurring protein. 

6. (Previously presented) The method of claim 1, wherein said initial character 
strings have at least 30% sequence identity with each other. 

7. (Previously presented) The method of claim 1, wherein said selecting in (ii) 
comprises selecting at least one substring from an initial character string such that the ends of 
said substring occur in string regions of about 3 to about 20 characters in the initial character 



string that have higher sequence identity with the corresponding region of another of said initial 
character strings than the overall sequence identity between the two initial character strings. 

8. (Previously presented) The method of claim 1, wherein said selecting in (ii) 
comprises selecting substrings such that the ends of said substrings occur in predefined motifs of 
about 4 to about 8 characters. 

9. (canceled) 

10. (Previously presented) The method of claim 1, wherein said selecting in (ii) 
comprises aligning two or more of said initial character strings to maximize pairwise identity 
between two or more substrings of the initial character strings, and selecting a character that is a 
member of an aligned pair for the end of one of the two or more substrings. 

1 1 . (canceled) 

12. (Previously presented) The method of claim 1, wherein said method further 
comprises randomly altering one or more characters of said initial or product character strings. 

13. (Currently amended) The method of claim 12, wherein said method further 
comprises randomly selecting and altering one or more occurrences of a particular preselected 
character in said initial or product character strings. 

14. (Previously presented) The method of claim 1, wherein said encoding, selecting, 
or concatenating is performed on an internet site. 

15. (Previously presented) The method of claim 1, wherein said encoding, selecting, or 
concatenating is performed on a server. 

16. (Previously presented) The method of claim 1, wherein said encoding, selecting, or 
concatenating is performed on a client linked to a network. 

17. (Currently Amended) A computer program product on a computer readable media 
comprising computer code that: 

i) encodes two or more biological molecules into initial character strings to provide a 
collection of two or more different initial character strings wherein each of said biological 
molecules comprises at least about ten subunits; 

ii) selects at least two initial substrings from said initial character strings; 



iii) concatenates said substrings to form one or more product strings about the same 
length as one or more of the initial character strings; 

iv) adds the product strings to a data structure to populate a data structure of product 

strings; 

v) determines sequence identities of at least one of the product strings relative to at least 
one initial character string; and 

vi) selects one or more product biological molecules for production, wherein the one or 
more product biological molecules correspond to one or more of the product strings having 
greater than 30% sequence identity with the at least one initial character string. 

18. (Currently amended) The computer program product of claim 17, wherein said 
two or more biological molecules are nucleic acid sequences e ncoding naturally occurring 
proteins . 

19. (Previously presented) The computer program product of claim 17, wherein said 
two or more biological molecules are nucleic acid sequences encoding naturally occurring 
proteins. 

20. (Previously presented) The computer program product of claim 17, wherein said 
two or more biological molecules are amino acid sequences. 

21 . (Currently amended) The computer program product of claim 1 7, wherein said 
initial character strings have at least 30% sequence identity with each other . 

22. (Previously presented) The computer program product of claim 17, wherein said 
computer code selects in (ii) at least one substring from an initial character string such that the 
ends of said substring occur in string regions of about three to about twenty characters in the 
initial character string that have higher sequence identity with a corresponding region of another 
of said initial character strings than the overall sequence identity between the two initial 
character substrings. 

23. (Previously presented) The computer program product of claim 17, wherein said 
computer code selects substrings such that the ends of said substrings occur in predefined motifs 
of about 4 to about 8 characters. 



24. (canceled) 



25. (Previously presented) The computer program product of claim 17, wherein the 
computer code selects substrings by aligning two or more of said initial character strings to 
maximize pairwise identity between two or more substrings of the character strings, and 
selecting a character that is a member of an aligned pair for the end of one substring. 

26. (canceled) 

27. (Currently amended) The computer program product of claim 17, wherein said 
computer code additionally randomly alters one or more characters of said initial or product 
character strings. 

28. (Currently amended) The computer program product of claim 27, wherein said 
computer code additionally randomly selects and alters one or more occurrences of a particular 
preselected character in said initial or product character strings. 

29. (Previously presented) The computer program product of claim 17, wherein said 
computer code is stored on media selected from the group consisting of magnetic media, optical 
media, and optomagnetic media. 

30. (Previously presented) The computer program product of claim 17, wherein said 
computer code is in dynamic or static memory of a computer. 

31-44. (canceled) 

45. (Currently amended) The method of claim 1, wherein the initial character strings 
of (i) are related in that they encode the same gene or protein family but differ in sequence . 

46. (canceled) 

47. (Previously presented) The method of claim 1, further comprising determining a 
computationally predicted property for molecules represented by the product strings. 

48. (Previously presented) The method of claim 1, wherein the molecules represented 
by the product strings are made in parallel in an array of vessels. 

49. (Previously presented) The method of claim 1, wherein the molecules represented 
by the product strings are made by assembly of oligonucleotides. 



50. (canceled) 

5 1 . (Currently amended) The computer program product of claim 1 7, wherein the 
initial character strings of (i) are related in that they encode the same gene or protein family but 
differ in sequence . 

52. (Previously presented) The computer program product of claim 1 7, wherein the 
code instructs physical screening of the molecule(s) represented by the product strings for one or 
more desired properties. 

53. (Previously presented) The computer program product of claim 1 7, wherein the 
code instructs determination of a computationally predicted property for molecules represented 
by the product strings. 

54. (Canceled) 

55. (Canceled) 

56. (Canceled) 

57. (Currently amended) A method of identifying molecules for production, wherein 
the molecules are represented by concatenated strings, said method comprising: 

i) encoding two or more related biological molecules into a data structure of initial 
character strings to provide a collection of two or more different initial character strings wherein 
each of said biological molecules comprises at least about 10 subunits; 

ii) selecting at least two substrings from said initial character strings; 

iii) concatenating said substrings to form one or more product strings; 

iv) adding the product strings to a data structure to populate a data structure of product 
strings; and 

v) determining whether at least one of the product strings have at least a predefined 
measure of similarity with at least one initial character string; and 

vi) selecting one or more product biological molecules for production, wherein the one or 
more product biological molecules correspond to one or more of the product strings determined 
to have greater than the predefined value of sequence identity with at least one initial string. 



58. (Previously presented) The method of claim 1 , wherein the one or more product 
strings of (vi) have greater than 50% sequence identity with the at least one initial character 
string. 

59. (Previously presented) The method of claim 1, wherein the one or more product 
strings of (vi) have greater than 75% sequence identity with the at least one initial character 
string. 

60. (Previously presented) The method of claim 1 , wherein the one or more product 
strings of (vi) have greater than 85% sequence identity with the at least one initial character 
string. 

61. (Previously presented) The method of claim 1, wherein the one or more product 
strings of (vi) have greater than 90% sequence identity with the at least one initial character 
string. 

62. (Previously presented) The method of claim 1, wherein the one or more product 
strings of (vi) have greater than 95% sequence identity with the at least one initial character 
string. 

63. (Previously presented) The computer program product of claim 17, wherein the 
one or more product strings of (vi) having greater than 50% sequence identity with the at least 
one initial character string. 

64. (Previously presented) The computer program product of claim 17, wherein the 
one or more product strings of (vi) having greater than 75% sequence identity with the at least 
one initial character string. 

65. (Previously presented) The computer program product of claim 17, wherein the 
one or more product strings of (vi) having greater than 95% sequence identity with the at least 
one initial character string. 

66. (Currently amended) A method of identifying molecules for production, wherein 
the molecules are represented by concatenated strings, said method comprising: 

i) encoding two or more biological molecules into a data structure of initial character 
strings to provide a collection of two or more different initial character strings wherein each of 
said biological molecules comprises at least about 10 subunits; 

ii) selecting at least two substrings from said initial character strings; 



iii) concatenating said substrings to form one or more product strings about the same 
length as one or more of the initial character strings; 

iv) adding the product strings to a data structure to populate a data structure of product 

strings; 

v) providing an alignment of at least one of the product strings relative to at least one 
initial character string ; and 

vi) selecting one or more product biological molecules for production, wherein the one 
or more product biological molecules correspond to one or more of the product strings having 
greater than 30% sequence identity with the at least one initial character string. 

67. (Previously presented) The method of claim 66, wherein said encoding comprises 
encoding two or more amino acid sequences into said character strings, and wherein said two or 
more amino acid sequences comprise an amino acid sequence encoding a naturally occurring 
protein. 

68. (Previously presented) The method of claim 66, wherein said initial character 
strings have at least 30% sequence identity with each other. 

69. (Previously presented) The method of claim 66, wherein said selecting in (ii) 
comprises selecting at least one substring from an initial character string such that the ends of 
said substring occur in string regions of about 3 to about 20 characters in the initial character 
string that have higher sequence identity with the corresponding region of another of said initial 
character strings than the overall sequence identity between the two initial character strings. 

70. (Previously presented) The method of claim 66, wherein said selecting in (ii) 
comprises selecting substrings such that the ends of said substrings occur in predefined motifs of 
about 4 to about 8 characters. 

71 . (Previously presented) The method of claim 66, wherein said selecting in (ii) 
comprises aligning two or more of said initial character strings to maximize pairwise identity 
between two or more substrings of the initial character strings, and selecting a character that is a 
member of an aligned pair for the end of one of the two or more substrings. 

72. (Previously presented) The method of claim 66, wherein said method further 
comprises randomly altering one or more characters of said initial or product character strings. 



73. (Previously presented) The method of claim 66, wherein the one or more product 
strings of (vi) having greater than 50% sequence identity with the at least one initial character 
string. 

74. (Previously presented) The method of claim 66, wherein the one or more product 
strings of (vi) having greater than 75% sequence identity with the at least one initial character 
string. 

75. (Previously presented) The method of claim 66, wherein the one or more product 
strings of (vi) having greater than 85% sequence identity with the at least one initial character 
string. 

76. (Previously presented) The method of claim 66, wherein the one or more product 
strings of (vi) having greater than 90% sequence identity with the at least one initial character 
string. 

77. (Previously presented) The method of claim 66, wherein the one or more product 
strings of (vi) having greater than 95% sequence identity with the at least one initial character 
string. 

78. (Currently amended) A computer program product on a computer readable media 
comprising computer code that: 

i) encodes two or more biological molecules into initial character strings to provide a 
collection of two or more different initial character strings wherein each of said biological 
molecules comprises at least about ten subunits; 

ii) selects at least two initial substrings from said initial character strings; 

iii) concatenates said substrings to form one or more product strings about the same 
length as one or more of the initial character strings; 

iv) adds the product strings to a data structure to populate a data structure of product 

strings; 

v) provides an alignment of at least one of the product strings relative to at least one 
initial character string ; and 

vi) selects one or more product biological molecules for production, wherein the one or 
more product biological molecules correspond to one or more of the product strings having 
greater than 30% sequence identity with the at least one initial character string. 



79. (Previously presented) The computer program product of claim 78, wherein said 
computer code encodes two or more amino acid sequences into said character strings, and 
wherein said two or more amino acid sequences comprise an amino acid sequence encoding a 
naturally occurring protein. 

80. (Previously presented) The computer program product of claim 78, wherein said 
initial character strings have at least 30% sequence identity with each other. 

81 . (Previously presented) The computer program product of claim 78, wherein said 
computer code selects in (ii) at least one substring from an initial character string such that the 
ends of said substring occur in string regions of about three to about twenty characters in the 
initial character string that have higher sequence identity with a corresponding region of another 
of said initial character strings than the overall sequence identity between the two initial 
character substrings. 

82. (Previously presented) The computer program product of claim 78, wherein said 
computer code selects in (ii) by selecting substrings such that the ends of said substrings occur in 
predefined motifs of about 4 to about 8 characters. 

83. (Previously presented) The computer program product of claim 78, wherein said 
computer code selects in (ii) by aligning two or more of said initial character strings to maximize 
pairwise identity between two or more substrings of the initial character strings, and selecting a 
character that is a member of an aligned pair for the end of one of the two or more substrings. 

84. (Previously presented) The computer program product of claim 78, wherein said 
computer code further randomly alters one or more characters of said initial or product character 
strings. 

85. (Previously presented) The computer program product of claim 78, wherein the 
one or more product strings of (vi) having greater than 50% sequence identity with the at least 
one initial character string. 

86. (Previously presented) The computer program product of claim 78, wherein the 
one or more product strings of (vi) having greater than 75% sequence identity with the at least 
one initial character string. 



87. (Previously presented) The computer program product of claim 78, wherein the 
one or more product strings of (vi) having greater than 85% sequence identity with the at least 
one initial character string. 

88. (Previously presented) The computer program product of claim 78, wherein the 
one or more product strings of (vi) having greater than 90% sequence identity with the at least 
one initial character string. 

89. (Previously presented) The computer program product of claim 78, wherein the 
one or more product strings of (vi) having greater than 95% sequence identity with the at least 
one initial character string. 

90. (Previously presented) A method of identifying molecules for production, wherein 
the molecules are represented by concatenated strings, said method comprising: 

i) encoding two or more naturally occurring biological molecules into a data structure of 
initial character strings to provide a collection of two or more different initial character strings 
wherein each of said biological molecules comprises at least about 10 subunits; 

ii) selecting at least two substrings from said initial character strings; 

iii) concatenating said substrings to form one or more product strings about the same 
length as one or more of the initial character strings; 

iv) adding the product strings to a data structure to populate a data structure of product 
strings; and 

v) selecting one or more product biological molecules for production, wherein the one or 
more product biological molecules correspond to one or more of the product strings having 
greater than 30% sequence identity with the at least one initial character string. 

91. (Previously presented) The method of claim 90, wherein said encoding comprises 
encoding two or more nucleic acid sequences into said character strings. 

92. (Previously presented) The method of claim 90, wherein said encoding comprises 
encoding two or more amino acid sequences into said character strings, and wherein said two or 
more amino acid sequences comprise an amino acid sequence encoding a naturally occurring 
protein. 

93. (Previously presented) The method of claim 90, wherein said initial character 
strings have at least 30% sequence identity with each other. 



94. (Previously presented) The method of claim 90, wherein said selecting in (ii) 
comprises selecting at least one substring from an initial character string such that the ends of 
said substring occur in string regions of about 3 to about 20 characters in the initial character 
string that have higher sequence identity with the corresponding region of another of said initial 
character strings than the overall sequence identity between the two initial character strings. 

95. (Previously presented) The method of claim 90, wherein said selecting in (ii) 
comprises selecting substrings such that the ends of said substrings occur in predefined motifs of 
about 4 to about 8 characters. 

96. (Previously presented) The method of claim 90, wherein said selecting in (ii) 
comprises aligning two or more of said initial character strings to maximize pairwise identity 
between two or more substrings of the initial character strings, and selecting a character that is a 
member of an aligned pair for the end of one of the two or more substrings. 

97. (Previously presented) The method of claim 90, wherein said method further 
comprises randomly altering one or more characters of said initial or product character strings. 

98. (Previously presented) The method of claim 90, wherein the one or more product 
strings of (v) having greater than 50% sequence identity with the at least one initial character 
siring. 

99. (Previously presented) The method of claim 90, wherein the one or more product 
strings of (v) having greater than 75% sequence identity with the at least one initial character 
string. 

100. (Previously presented) The method of claim 90, wherein the one or more product 
strings of (v) having greater than 85% sequence identity with the at least one initial character 
string. 

101 . (Previously presented) The method of claim 90, wherein the one or more product 
strings of (v) having greater than 90% sequence identity with the at least one initial character 
string. 

102. (Previously presented) The method of claim 90, wherein the one or more product 
strings of (v) having greater than 95% sequence identity with the at least one initial character 
string. 



103. (Currently amended) A computer program product on a computer readable media 
comprising computer code that: 

i) encodes two or more naturally occurring biological molecules into initial character 
strings to provide a collection of two or more different initial character strings wherein each of 
said biological molecules comprises at least about ten subunits; 

ii) selects at least two initial substrings from said initial character strings; 

iii) concatenates said substrings to form one or more product strings about the same 
length as one or more of the initial character strings; 

iv) adds the product strings to a data structure to populate a data structure of product 
strings; and 

v) selects one or more product biological molecules for production, wherein the one or 
more product biological molecules correspond to one or more of the product strings having 
greater than 30% sequence identity with the at least one initial character string. 

104. (Previously presented) The computer program product of claim 1 03, wherein said 
computer code encodes by encoding two or more nucleic acid sequences into said character 
strings. 

1 05. (Previously presented) The computer program product of claim 1 03, wherein said 
computer code encodes two or more amino acid sequences into said character strings, and 
wherein said two or more amino acid sequences comprise an amino acid sequence encoding a 
naturally occurring protein. 

106. (Previously presented) The computer program product of claim 103, wherein said 
initial character strings have at least 30% sequence identity with each other. 

1 07. (Previously presented) The computer program product of claim 1 03, wherein said 
computer code selects in (ii) at least one substring from an initial character string such that the 
ends of said substring occur in string regions of about three to about twenty characters in the 
initial character string that have higher sequence identity with a corresponding region of another 
of said initial character strings than the overall sequence identity between the two initial 
character substrings. 

108. (Previously presented) The computer program product of claim 103, wherein said 
computer code selects in (ii) by selecting substrings such that the ends of said substrings occur in 
predefined motifs of about 4 to about 8 characters. 



109. (Previously presented) The computer program product of claim 103, wherein said 
computer code selects in (ii) by aligning two or more of said initial character strings to maximize 
pairwise identity between two or more substrings of the initial character strings, and selecting a 
character that is a member of an aligned pair for the end of one of the two or more substrings. 

110. (Previously presented) The computer program product of claim 103, wherein said 
computer code further randomly alters one or more characters of said initial or product character 
strings. 

111. (Previously presented) The computer program product of claim 103, wherein the 
one or more product strings of (v) having greater than 50% sequence identity with the at least 
one initial character string. 

112. (Previously presented) The computer program product of claim 103, wherein the 
one or more product strings of (v) having greater than 75% sequence identity with the at least 
one initial character string. 

113. (Previously presented) The computer program product of claim 103, wherein the 
one or more product strings of (v) having greater than 85% sequence identity with the at least 
one initial character string. 

114. (Previously presented) The computer program product of claim 103, wherein the 
one or more product strings of (v) having greater than 90% sequence identity with the at least 
one initial character string. 

115. (Previously presented) The computer program product of claim 103, wherein the 
one or more product strings of (v) having greater than 95% sequence identity with the at least 
one initial character string. 

1 1 6. (Currently amended) A method of identifying molecules for production, wherein 
the molecules are represented by concatenated strings, said method comprising: 

i) encoding two or more biological molecules into a data structure of initial character 
strings to provide a collection of two or more different initial character strings wherein each of 
said biological molecules comprises at least about 10 subunits; 

ii) selecting at least two substrings from said initial character strings; 

iii) concatenating said substrings to form one or more product strings about the same 
length as one or more of the initial character strings; 



iv) adding the product strings to a data structure to populate a data structure of product 

strings; 

v) obtaining one or more computationally predicted properties for at least one of the 
product strings in the data structure; and 

vi) selecting one or more product biological molecules for production on the basis of the 
one or more computationally predicted properties. 

117. (Previously presented) The method of claim 1 16, wherein the computationally 
predicted properties comprise one or more of a maximum or minimum molecular weight, a 
maximum or minimum free energy, a maximum or minimum contact surface with a target 
molecule or surface, a specified net charge, a predicted pK, a predicted pi, a binding avidity, 
secondary form, and tertiary form. 

118. (Previously presented) The method of claim 1 1 6, wherein said encoding comprises 
encoding two or more amino acid sequences into said character strings. 

1 19. (Previously presented) The method of claim 116, wherein said selecting in (ii) 
comprises aligning two or more of said initial character strings to maximize pairwise identity 
between two or more substrings of the initial character strings, and selecting a character that is a 
member of an aligned pair for the end of one of the two or more substrings. 

120. (Previously presented) The method of claim 116, wherein said method further 
comprises randomly altering one or more characters of said initial or product character strings. 

121. (Previously presented) The method of claim 116, wherein the one or more product 
biological molecules of (vi) having greater than 50% sequence identity with the at least one 
initial character string. 

122. (Previously presented) The method of claim 116, wherein the one or more product 
biological molecules of (vi) having greater than 75% sequence identity with the at least one 
initial character string. 

123. (Previously presented) The method of claim 116, wherein the one or more product 
biological molecules of (vi) having greater than 90% sequence identity with the at least one 
initial character string. 



124. (Currently amended) A computer program product on a computer readable media 
comprising computer code that: 

i) encodes two or more biological molecules into initial character strings to provide a 
collection of two or more different initial character strings wherein each of said biological 
molecules comprises at least about ten subunits; 

ii) selects at least two initial substrings from said initial character strings; 

iii) concatenates said substrings to form one or more product strings about the same 
length as one or more of the initial character strings; 

iv) adds the product strings to a data structure to populate a data structure of product 

strings; 

v) obtains one or more computationally predicted properties for at least one of the 
product strings in the data structure; and 

vi) selects one or more product biological molecules for production on the basis of the 
one or more computationally predicted properties. 

125. (Previously presented) The computer program product of claim 124, wherein the 
computationally predicted properties comprise one or more of a maximum or minimum 
molecular weight, a maximum or minimum free energy, a maximum or minimum contact surface 
with a target molecule or surface, a specified net charge, a predicted pK, a predicted pi, a binding 
avidity, secondary form, and tertiary form. 

1 26. (Previously presented) The computer program product of claim 124, wherein the 
computer code encodes in (i) by encoding two or more amino acid sequences into said character 
strings. 

127. (Previously presented) The computer program product of claim 124, wherein the 
computer code selects in (ii) by aligning two or more of said initial character strings to maximize 
pairwise identity between two or more substrings of the initial character strings, and selecting a 
character that is a member of an aligned pair for the end of one of the two or more substrings. 

128. (Previously presented) The computer program product of claim 124, wherein the 
computer code further randomly alters one or more characters of said initial or product character 
strings. 



129. (Previously presented) The computer program product of claim 124, wherein the 
one or more product biological molecules of (vi) having greater than 50% sequence identity with 
the at least one initial character string. 

130. (Previously presented) The computer program product of claim 124, wherein the 
one or more product biological molecules of (vi) having greater than 75% sequence identity with 
the at least one initial character string. 

131. (Previously presented) The computer program product of claim 124, wherein the 
one or more product biological molecules of (vi) having greater than 90% sequence identity with 
the at least one initial character string. 

132. (Currently amended) The method of claim 1, wherein adding the product strings 
to a data structure comprises adding more than one product strings string to the data structure. 

133. (Previously presented) The method of claim 1, wherein selecting at least two 
substrings from said initial character strings comprises random substring selection. 

134. (Previously presented) The method of claim 1, wherein selecting at least two 
substrings from said initial character strings comprises uniform substring selection. 

135. (Previously presented) The method of claim 1, wherein selecting at least two 
substrings from said initial character strings comprises motif-based selection. 

136. (Previously presented) The method of claim 1, wherein selecting at least two 
substrings from said initial character strings comprises alignment-based selection. 

137. (Previously presented) The method of claim 1, wherein selecting at least two 
substrings from said initial character strings comprises frequency-biased selection. 

138. (Currently amended) The computer program product of claim 17, wherein the 
computer code adds the product strings to a data structure by adding more than one product 
strings string to the data structure. 



139. (Previously presented) The computer program product of claim 17, wherein the 
computer code selects at least two substrings from said initial character strings by a random 
substring selection. 

140. (Previously presented) The computer program product of claim 17, wherein the 
computer code selects at least two substrings from said initial character strings by a uniform 
substring selection. 

141. (Previously presented) The computer program product of claim 17, wherein the 
computer code selects at least two substrings from said initial character strings by a motif-based 
selection. 

142. (Previously presented) The computer program product of claim 17, wherein the 
computer code selects at least two substrings from said initial character strings by an alignment- 
based selection. 

143. (Previously presented) The computer program product of claim 17, wherein the 
computer code selects at least two substrings from said initial character strings by a frequency- 
biased selection. 

144. (Currently amended) The method of claim 66, wherein adding the product strings 
to a data structure comprises adding more than one product strings string to the data structure. 

145. (Previously presented) The method of claim 66, wherein selecting at least two 
substrings from said initial character strings comprises random substring selection. 

146. (Previously presented) The method of claim 66, wherein selecting at least two 
substrings from said initial character strings comprises uniform substring selection. 

147. (Previously presented) The method of claim 66, wherein selecting at least two 
substrings from said initial character strings comprises motif-based selection. 

148. (Previously presented) The method of claim 66, wherein selecting at least two 
substrings from said initial character strings comprises alignment-based selection. 



149. (Previously presented) The method of claim 66, wherein selecting at least two 
substrings from said initial character strings comprises frequency-biased selection. 

1 50. (Currently amended) The computer program product of claim 78, wherein the 
computer code adds the product strings to a data structure by adding more than one product 
strings string to the data structure. 

151. (Previously presented) The computer program product of claim 78, wherein the 
computer code selects at least two substrings from said initial character strings by a random 
substring selection. 

1 52. (Previously presented) The computer program product of claim 78, wherein the 
computer code selects at least two substrings from said initial character strings by a uniform 
substring selection. 

1 53. (Previously presented) The computer program product of claim 78, wherein the 
computer code selects at least two substrings from said initial character strings by a motif-based 
selection. 

1 54. (Previously presented) The computer program product of claim 78, wherein the 
computer code selects at least two substrings from said initial character strings by an alignment- 
based selection. 

155. (Previously presented) The computer program product of claim 78, wherein the 
computer code selects at least two substrings from said initial character strings by a frequency- 
biased selection. 

156. (Currently amended) The method of claim 90, wherein adding the product strings 
to a data structure comprises adding more than one product strings string to the data structure. 

157. (Previously presented) The method of claim 90, wherein selecting at least two 
substrings from said initial character strings comprises random substring selection. 



158. (Previously presented) The method of claim 90, wherein selecting at least two 
substrings from said initial character strings comprises uniform substring selection. 

1 59. (Previously presented) The method of claim 90, wherein selecting at least two 
substrings from said initial character strings comprises motif-based selection. 

160. (Previously presented) The method of claim 90, wherein selecting at least two 
substrings from said initial character strings comprises alignment-based selection. 

161 . (Previously presented) The method of claim 90, wherein selecting at least two 
substrings from said initial character strings comprises frequency-biased selection. 

162. (Currently amended) The computer program product of claim 103, wherein the 
computer code adds the product strings to a data structure by adding more than one product 
strings string to the data structure. 

163. (Previously presented) The computer program product of claim 103, wherein the 
computer code selects at least two substrings from said initial character strings by a random 
substring selection. 

164. (Previously presented) The computer program product of claim 103, wherein the 
computer code selects at least two substrings from said initial character strings by a uniform 
substring selection. 

1 65 . (Previously presented) The computer program product of claim 1 03 , wherein the 
computer code selects at least two substrings from said initial character strings by a motif-based 
selection. 

166. (Previously presented) The computer program product of claim 103, wherein the 
computer code selects at least two substrings from said initial character strings by an alignment- 
based selection. 



167. (Previously presented) The computer program product of claim 103, wherein the 
computer code selects at least two substrings from said initial character strings by a frequency- 
biased selection. 

168. (Currently amended) The method of claim 1 16, wherein adding the product 
strings to a data structure comprises adding more than one product strings string to the data 
structure. 

169. * (Previously presented) The method of claim 1 16, wherein selecting at least two 
substrings from said initial character strings comprises random substring selection. 

170. (Previously presented) The method of claim 1 16, wherein selecting at least two 
substrings from said initial character strings comprises uniform substring selection. 

171. (Previously presented) The method of claim 116, wherein selecting at least two 
substrings from said initial character strings comprises motif-based selection. 

172. (Previously presented) The method of claim 116, wherein selecting at least two 
substrings from said initial character strings comprises alignment-based selection. 

1 73 . (Previously presented) The method of claim 116, wherein selecting at least two 
substrings from said initial character strings comprises frequency-biased selection. 

1 74. (Currently amended) The computer program product of claim 124, wherein the 
computer code adds the product strings to a data structure by adding more than one product 
strings string to the data structure. 

175. (Previously presented) The computer program product of claim 124, wherein the 
computer code selects at least two substrings from said initial character strings by a random 
substring selection. 

176. (Previously presented) The computer program product of claim 124, wherein the 
computer code selects at least two substrings from said initial character strings by a uniform 
substring selection. 



177. (Previously presented) The computer program product of claim 124, wherein the 
computer code selects at least two substrings from said initial character strings by a motif-based 
selection. 

178. (Previously presented) The computer program product of claim 124, wherein the 
computer code selects at least two substrings from said initial character strings by an alignment- 
based selection. 

1 79. (Previously presented) The computer program product of claim 1 24, wherein the 
computer code selects at least two substrings from said initial character strings by a frequency- 
biased selection. 
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